Skip to content

feat: end-to-end Keycloak authentication, RBAC, and Python SDK integration#367

Open
mahil-2040 wants to merge 15 commits into
volcano-sh:mainfrom
mahil-2040:feat/keycloak-deployment-realm-setup
Open

feat: end-to-end Keycloak authentication, RBAC, and Python SDK integration#367
mahil-2040 wants to merge 15 commits into
volcano-sh:mainfrom
mahil-2040:feat/keycloak-deployment-realm-setup

Conversation

@mahil-2040

@mahil-2040 mahil-2040 commented May 28, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR introduces end-to-end external user authentication and authorization using Keycloak as the identity provider. It expands the initial scope to include the complete flow in a single PR: Keycloak deployment, Router integration, Resource-Level Access Control (RLAC), Python SDK support, and the formal design proposal.

1. Design Proposal

  • Adds docs/design/keycloak-proposal.md detailing the architecture, request flow, and implementation decisions for the external authentication layer.

2. Keycloak Deployment

  • Declarative realm import with agentcube realm, role hierarchy (sandbox:invokesandbox:manageadmin), and OAuth2 clients.
  • Production-hardened Deployment with security contexts, health probes, and Helm validation guards.

3. Router OIDC Integration & RBAC

  • OIDC token validator using go-oidc with automatic JWKS key caching.
  • Authentication middleware validates Bearer JWT tokens from Keycloak.
  • RBAC middleware enforces sandbox:invoke realm role on all /v1/* endpoints.
  • Feature-gated behind --enable-external-auth (default: off, zero impact).

4. Downstream Identity Forwarding & RLAC

  • Router embeds user identity (sub) in internal PicoD JWT claims.
  • Router sends X-AgentCube-User-Id header to WorkloadManager.
  • WorkloadManager (RLAC): Tags newly created sandbox CRDs with the agentcube.io/owner: <user_id> label.
  • Router (RLAC): Verifies that the caller's JWT subject matches the sandbox's OwnerID before proxying requests.

5. Python SDK Auth Integration

  • Introduces pluggable auth provider pattern (AuthProvider protocol).
  • Adds ServiceAccountAuth for client_credentials flow with automatic token refresh.
  • Adds TokenAuth for static tokens.
  • Updates clients (Control Plane and Data Plane) to seamlessly handle token injection.

6. Testing

  • Unit tests for OIDC validator, auth middleware, RBAC, identity forwarding, and SDK auth.
  • E2E tests deploying real Keycloak in Kind cluster to validate unauthenticated rejection, invalid tokens, valid invocations, and RLAC ownership blocks.

Which issue(s) this PR fixes:
Fixes Part of #243

Special notes for your reviewer:

  • External auth is opt-in via keycloak.enabled=true. Existing deployments are unaffected.
  • Token Exchange (RFC 8693) is not used - the Router forwards the user identity directly since mTLS provides transport-level security between components.
  • New Go dependency: github.com/coreos/go-oidc/v3 (used by Kubernetes itself).

Does this PR introduce a user-facing change?:

yes

Add Keycloak OIDC authentication support, Resource-Level Access Control (RLAC), and Python SDK auth integration. When enabled via `keycloak.enabled=true`, the Router requires a valid Bearer token with the `sandbox:invoke` realm role, and users can only access sandboxes they created. The Python SDK now supports seamless `client_credentials` authentication with automatic token refresh.

Copilot AI review requested due to automatic review settings May 28, 2026 18:25
@volcano-sh-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hzxuzhonghu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Keycloak deployment, service, and realm configuration to the Helm chart for external user authentication. The reviewer identified several critical security and configuration issues: wildcard redirect URIs and web origins on the confidential SDK client should be configurable; the admin password should be sourced from a Kubernetes Secret instead of plaintext; production mode lacks support for external databases and proxy/HTTPS settings; SSL requirements should be dynamically enabled; and global image pull secrets need to be propagated to the Keycloak pod template.

Comment thread manifests/charts/base/templates/keycloak/keycloak-realm.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak-realm.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak.yaml Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds an optional Keycloak deployment to the Helm base chart to support external user authentication, including default values, a Deployment/Service, and a realm-import ConfigMap.

Changes:

  • Introduces keycloak configuration block in chart values.
  • Adds Keycloak Deployment + Service template gated by keycloak.enabled.
  • Adds a realm JSON import ConfigMap template for bootstrapping clients/roles.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

File Description
manifests/charts/base/values.yaml Adds configurable values for enabling and running a Keycloak instance
manifests/charts/base/templates/keycloak/keycloak.yaml Adds Deployment/Service templates for Keycloak
manifests/charts/base/templates/keycloak/keycloak-realm.yaml Adds ConfigMap for importing an initial realm configuration

Comment thread manifests/charts/base/values.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak-realm.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak-realm.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak-realm.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak.yaml Outdated
@codecov-commenter

codecov-commenter commented May 28, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 61.75549% with 122 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.14%. Comparing base (524e55e) to head (fe1b037).
⚠️ Report is 121 commits behind head on main.

Files with missing lines Patch % Lines
pkg/router/auth.go 45.71% 36 Missing and 2 partials ⚠️
pkg/router/oidc.go 64.28% 21 Missing and 14 partials ⚠️
pkg/router/server.go 19.23% 17 Missing and 4 partials ⚠️
pkg/workloadmanager/identity.go 59.52% 13 Missing and 4 partials ⚠️
pkg/router/handlers.go 81.81% 5 Missing and 1 partial ⚠️
pkg/router/session_manager.go 85.00% 2 Missing and 1 partial ⚠️
pkg/workloadmanager/handlers.go 90.90% 1 Missing and 1 partial ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #367       +/-   ##
===========================================
+ Coverage   47.57%   58.14%   +10.57%     
===========================================
  Files          30       37        +7     
  Lines        2819     3474      +655     
===========================================
+ Hits         1341     2020      +679     
+ Misses       1338     1244       -94     
- Partials      140      210       +70     
Flag Coverage Δ
unittests 58.14% <61.75%> (+10.57%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread manifests/charts/addons/keycloak/templates/realm-config.yaml
Comment thread manifests/charts/base/templates/keycloak/keycloak.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak.yaml Outdated

@hzxuzhonghu hzxuzhonghu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before adding a bunch of yamls, can you first add a docs exmplain about the proposal and the workflow

@mahil-2040

mahil-2040 commented May 30, 2026

Copy link
Copy Markdown
Contributor Author

before adding a bunch of yamls, can you first add a docs exmplain about the proposal and the workflow

@hzxuzhonghu @acsoto
Yeah sure, it's a good suggestion, so now should i delete this PR, and make a fresh PR in which i will first make a doc explaining about the proposal and the workflow, and when it is approved then only i will push the .yaml code, or should i continue with this only this time, and follow this approach of making a doc from next feature?

@mahil-2040 mahil-2040 requested a review from acsoto June 1, 2026 06:00
@acsoto

acsoto commented Jun 1, 2026

Copy link
Copy Markdown
Member

Either is ok

- Declarative Realm Import: Created a ConfigMap with the realm JSON defining
  clients (agentcube-sdk, agentcube-router, workloadmanager, agentcube-admin)
  and role hierarchy (sandbox:invoke -> sandbox:manage -> admin).
- Deployment & Service: Added Keycloak Deployment and Service manifests
  with management-port health probes and a 300s startupProbe window.
- Helm Integration: Added keycloak configuration values to values.yaml,
  gated behind 'keycloak.enabled: false' to ensure zero impact by default.

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
- Parameterized SDK redirect URIs with secure defaults, added support for secretKeyRef for admin/database credentials, and made sslRequired enforcement dynamic based on devMode.
- Added configuration blocks for external databases (PostgreSQL/MySQL) and reverse proxy settings (headers, hostname).
- Propagated global imagePullSecrets, decoupled Service targetPort by using the named 'http' port, and dynamically derived  the realm import filename from Helm values.

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
- Added fail-fast validation for database and proxy settings in production mode
- Decoupled credentials into chart-managed Secrets with required-value checks
- Added restricted securityContext for Keycloak pod and container
- Migrated realm configuration to Keycloak 26 default roles and client secrets

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
Copilot AI review requested due to automatic review settings June 2, 2026 17:43
@mahil-2040 mahil-2040 force-pushed the feat/keycloak-deployment-realm-setup branch from 5f5d385 to 9f8f5df Compare June 2, 2026 17:43
@mahil-2040

Copy link
Copy Markdown
Contributor Author

Either is ok

Okay i will continue in this one only for this time.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

Comment thread manifests/charts/base/templates/keycloak/keycloak-secrets.yaml Outdated
Comment thread manifests/charts/base/values.yaml Outdated
Comment thread manifests/charts/base/values.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak-realm.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak.yaml Outdated
Comment thread manifests/charts/base/templates/keycloak/keycloak.yaml Outdated
- Require explicit Keycloak admin and client credentials
- Enforced production Secret and external DB validation
- Fixed Secret key references and quote realm import filename

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
@hzxuzhonghu

Copy link
Copy Markdown
Member

@mahil-2040 I donot see your design

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
@mahil-2040 mahil-2040 changed the title feat: deployed Keycloak and configure agentcube feat: end-to-end Keycloak authentication, RLAC, and Python SDK integration Jun 3, 2026
Copilot AI review requested due to automatic review settings June 3, 2026 18:16

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 9 comments.

Comment thread pkg/router/oidc.go
Comment thread pkg/router/session_manager.go
Comment thread pkg/router/session_manager.go
Comment thread pkg/router/oidc.go Outdated
Comment thread pkg/router/oidc.go Outdated
Comment thread pkg/router/oidc.go Outdated
Comment thread pkg/router/session_manager.go
Comment thread pkg/router/oidc_test.go Outdated
Comment thread pkg/router/oidc_test.go Outdated
…dentity header

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
…nforcement

- Enforced resource-level access control (RLAC) in backend and reject router JWTs missing sub claim.
- Added Python SDK pluggable auth providers and routed data plane traffic via Router proxy.
- Wired Keycloak realm imports, client secrets, and service account roles dynamically.
- Added OIDC E2E tests

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
Copilot AI review requested due to automatic review settings June 5, 2026 22:26

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 41 out of 42 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (3)

sdk-python/agentcube/auth.py:1

  • Two concrete failure modes here: (1) if access_token is missing, this raises KeyError or later returns "", which can silently produce Authorization: Bearer and hard-to-debug 401s; raise a clear exception when the token field is missing/empty. (2) if expires_in is less than the refresh buffer (30s), _expires_at becomes “in the past” and every get_token() call will refetch; clamp the computed expiry so it never goes backward (e.g., ensure at least a small positive TTL).
    test/e2e/run_e2e.sh:1
  • Token acquisition has no validation/retries: if the port-forward isn’t ready or Keycloak returns an error page, jq -r can produce null/empty, and the suite will proceed with invalid tokens (causing confusing downstream failures). Add a small retry loop (or explicit HTTP status checks) and fail fast if either token is empty/null.
    sdk-python/tests/test_auth.py:1
  • time is imported but not used in this test file. Removing it keeps the test module clean and avoids linter noise.

Comment thread pkg/router/session_manager.go
Comment thread pkg/workloadmanager/identity.go
Comment thread pkg/workloadmanager/workload_builder.go
Comment thread pkg/workloadmanager/workload_builder.go
…d fixed python lint issue

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
@mahil-2040 mahil-2040 requested a review from hzxuzhonghu June 6, 2026 05:26
@mahil-2040

Copy link
Copy Markdown
Contributor Author

@acsoto @hzxuzhonghu Please review this!!

Comment thread cmd/router/main.go Outdated
Comment thread docs/agentcube/docs/tutorials/external-auth-keycloak.md
Comment thread docs/agentcube/docs/tutorials/external-auth-keycloak.md Outdated
Comment on lines +41 to +49
existingSecretServiceKey: "client-service-secret"
existingSecretRouterKey: "client-router-secret"
existingSecretAdminKey: "client-admin-secret"
service:
secret: ""
router:
secret: ""
admin:
secret: ""

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explain what's these secret are, and how to setup them

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are OAuth2 client secrets for the confidential Keycloak clients created by the addon chart. They are not Kubernetes service account tokens. Keycloak uses them when a caller requests an access token via the client_credentials grant.

We have three confidential clients:

  1. agentcube-service for SDK/server-side token requests,
  2. agentcube-router reserved for Router service-to-service flows, and
  3. agentcube-admin for administrative tokens.

There are two setup options:

  1. For dev/test, set the values directly:
    clients.service.secret, clients.router.secret, and clients.admin.secret. The chart creates the keycloak-credentials Kubernetes Secret.

  2. For production, create a Kubernetes Secret yourself and set clients.existingSecret. The existingSecretServiceKey, existingSecretRouterKey, and existingSecretAdminKey values tell the chart which keys inside that Secret contain each client secret.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the user guide, we should have a step by step how to generate them. I still not get the secret format, x509 or other format?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the tutorial to clarify the secret format and generation. The client secrets are plain-text OAuth2 shared secrets (similar to API keys). Added a step showing how to generate strong secrets for production using openssl rand -base64 32.

Comment thread docs/agentcube/docs/tutorials/external-auth-keycloak.md
Comment thread docs/design/keycloak-proposal.md Outdated

| Client ID | Type | Purpose |
|-----------|------|---------|
| `agentcube-service` | Confidential (`client_credentials`) | External backend applications and automated scripts using the Python SDK (Machine-to-Machine / M2M) |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confusing,
does it mean caller to agentcube-service?

Or agentcube-service is the client?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agentcube-service is an OAuth2 client ID registered in Keycloak. External backend apps and SDK users authenticate as this client to get a token, which they then send to the AgentCube Router. The old table mixed the client identity with who uses it. I've split it into separate "Who uses it" and "Purpose" columns in the updated table to make this clearer.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it by OAuth2 client ID

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use professional words

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, i will keep that in mind.

Comment thread docs/design/keycloak-proposal.md Outdated
Comment thread manifests/charts/addons/keycloak/values.yaml Outdated

@hzxuzhonghu hzxuzhonghu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not read the implement, please address these comments first and help me understand some concepts. Thank you

… and updated related documentation

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
Copilot AI review requested due to automatic review settings June 9, 2026 08:02

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 41 out of 42 changed files in this pull request and generated 2 comments.

Comment on lines +242 to +264
// setIdentityHeader signs and attaches the caller's identity as a JWT.
func (m *manager) setIdentityHeader(ctx context.Context, req *http.Request) {
if m.jwtManager == nil {
return
}
claims, ok := ctx.Value(contextKeyOIDCClaims).(*Claims)
if !ok || claims == nil {
return
}
identityClaims := map[string]interface{}{
"sub": claims.Subject,
"aud": "workloadmanager",
}
if claims.Email != "" {
identityClaims["email"] = claims.Email
}
identityToken, err := m.jwtManager.GenerateToken(identityClaims)
if err != nil {
klog.Warningf("Failed to sign identity JWT for WM: %v", err)
return
}
req.Header.Set("X-AgentCube-User-Identity", identityToken)
}
klog.Warningf("Failed to sign identity JWT for WM: %v", err)
return
}
req.Header.Set("X-AgentCube-User-Identity", identityToken)
@mahil-2040 mahil-2040 requested a review from hzxuzhonghu June 9, 2026 12:41
"jti": "31ef712f-f62a-4a4e-8714-7ccf710ab476",
"iss": "http://keycloak.agentcube.svc.cluster.local:8080/realms/agentcube",
"aud": "agentcube-api",
"sub": "bb0f4c04-e1d2-4bdb-b7bb-56d5c1cefe50",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we specify the user identity when signing jwt token

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sub claim is not specified by the caller, Keycloak sets it automatically. In this example we use the client_credentials grant, so the sub is the UUID that Keycloak assigns to the service account of the agentcube-service client. For real users (e.g. via authorization_code grant), sub would be the user's UUID from Keycloak. The caller cannot forge or choose the sub, it is signed into the JWT by Keycloak.


| Client ID | Type | Who uses it | Purpose |
|-----------|------|-------------|---------|
| `agentcube-service` | Confidential (`client_credentials`) | External backend services, CI jobs, automation scripts, and SDK users that can safely store a client secret | Main machine-to-machine client for calling AgentCube APIs. Tokens from this client receive `sandbox:invoke` and are used by `ServiceAccountAuth` in the Python SDK. |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually agentcube clients that access agentcube router, so agentcube-service is not suitable, any other app client name make sense.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, agentcube-service is misleading since it's not an internal AgentCube service but an OAuth2 client for external applications accessing the Router API.

Would one of these work better?

app-client or api-client

Or if you prefer something else please tell me i will update the code according to that.

| Client ID | Type | Who uses it | Purpose |
|-----------|------|-------------|---------|
| `agentcube-service` | Confidential (`client_credentials`) | External backend services, CI jobs, automation scripts, and SDK users that can safely store a client secret | Main machine-to-machine client for calling AgentCube APIs. Tokens from this client receive `sandbox:invoke` and are used by `ServiceAccountAuth` in the Python SDK. |
| `agentcube-sdk` | Public (`authorization_code` + PKCE) | Human users authenticating from a CLI, browser, or other installed app where a client secret cannot be safely stored | Interactive login client. It uses PKCE(Proof Key for Code Exchange) instead of a client secret and is intended for future browser/device/CLI login flows. |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain the workflow?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also confusing, since it seems has nothing to do with sdk

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the name agentcube-sdk is also misleading as the Python SDK actually uses the current agentcube-service client, not this one.

This is a public OAuth2 client for future interactive human login flows (CLI, browser) where a client secret cannot be safely stored. The authorization_code + PKCE workflow is:

  1. User runs a CLI command or opens a browser app
  2. App redirects user to the Keycloak login page
  3. User enters username/password in Keycloak
  4. Keycloak redirects back with a one-time authorization code
  5. App exchanges the code for an access token, using a PKCE code verifier. random challenge generated at step (2) instead of a client secret.

This flow is not implemented yet. it's a placeholder for future interactive authentication. Should I rename it to something clearer like agentcube-cli or agentcube-login or you have some better name then please let me know.


Confidential client secrets can be provided securely via an existing Kubernetes Secret (`keycloak.clients.existingSecret`) to prevent leaking them in Helm values. They are injected into the Keycloak pod as environment variables and securely substituted into the realm JSON during import. The `agentcube-sdk` client is a **public client** (no secret) that uses authorization code with PKCE for interactive flows, following RFC 8252 (OAuth 2.0 for Native Apps). The confidential clients are used only where a secret can be stored securely, and each client ID is separated by trust boundary so service, internal Router, and admin credentials can be scoped and rotated independently.

Both clients include a **hardcoded audience protocol mapper** (`oidc-audience-mapper`) that injects `agentcube-api` into the access token's `aud` claim. The Router validates `aud = "agentcube-api"`. This follows OAuth2 convention — the audience identifies the resource server (the Router API), not the client that requested the token.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both clients means what. I can guess, but not clear


type Claims struct {
Subject string // from standard "sub" claim
Email string // from standard "email" claim

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can user add email claim when using keycloak? Can you point where it is documented

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The email claim is a standard OIDC claim, no special setup is needed. Keycloak includes it automatically via the built-in email client scope, which is a default scope for all clients. The user just needs to have an email address set in their Keycloak profile.

For service accounts (client_credentials grant), there is no real user, so email will be empty. The Router handles this gracefully, it only forwards the email to sandboxes when present (oidc.go:115, handlers.go:291). It is not a required claim.

See Keycloak docs: Server Administration - Client Scopes

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a static config, can we still use keycloak api to dynamically register other roles?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The static realm config is only used for initial bootstrap on first startup. After that, roles, clients, users, scopes, etc. can be managed dynamically through:

  1. Keycloak Admin REST API. : e.g., create a new role:
POST /admin/realms/agentcube/roles
{"name": "custom:role"}
  1. Keycloak Admin Console, web UI at /admin for point-and-click management.

The Router does not need any changes when new roles are added, it only checks that the configured requiredRole exists in the token's role claim. Any additional roles can be used for application-level authorization downstream.

YOu can check here: Keycloak Admin REST API - Roles

@hzxuzhonghu hzxuzhonghu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the Keycloak auth / RLAC changes. Overall the OIDC validation, RBAC middleware, and SDK auth look solid. Main concern: the sandbox CRD is never actually tagged with the owner label/annotation, even though the PR description and the builder code suggest it should be. RLAC still works because the owner is persisted in the store, but the kubectl-level owner labels won't be present. Left a few inline notes.


// Set ownership from the Router-signed identity JWT
if ownerID := extractOwnerID(c.Request); ownerID != "" {
sandboxEntry.OwnerID = ownerID

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The owner is set on sandboxEntry here, but the sandbox CR object was already built above (lines 108/110) before we know the owner. So buildSandboxParams.ownerID stays empty and the agentcube.io/owner annotation / agentcube.io/owner-hash label are never applied to the actual CRD. RLAC still works via the store's OwnerID, but the CR labeling described in the PR doesn't happen. Consider extracting ownerID before building, then passing it into the builder.


// Ownership metadata for RLAC
if params.ownerID != "" {
sandbox.ObjectMeta.Annotations["agentcube.io/owner"] = params.ownerID

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is effectively dead code today: params.ownerID is never assigned anywhere, so this if is always false and the owner annotation/label is never written. The owner only ends up on the sandboxEntry (store), not on the CR. Wiring ownerID into buildSandboxParams/buildSandboxClaimParams would make this work as intended.

sub, err := verifyIdentityJWT(publicKey, rawToken)
if err != nil {
klog.V(2).Infof("Identity JWT verification failed: %v", err)
return ""

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractOwnerID fails open here: if the public key isn't cached yet or verification fails, it returns an empty owner and the sandbox is created with no owner. On the Router side, RLAC fails closed (no owner = access denied), so such a sandbox becomes permanently unreachable by its creator. Worth deciding on one consistent behavior, or at least logging at a higher level so this is easy to spot.

Comment thread pkg/router/oidc.go
if expiry == 0 {
return nil, fmt.Errorf("token missing required exp claim")
}
if time.Now().After(time.Unix(int64(expiry), 0)) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: exp/nbf are compared against time.Now() with no clock-skew leeway. A small allowance (e.g. 30s) avoids spurious rejections when the IdP and Router clocks drift slightly.

body = resp.json()
self._token = body["access_token"]
expires_in = int(body.get("expires_in", 3600))
self._expires_at = time.monotonic() + expires_in - _REFRESH_BUFFER_SECONDS

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if the IdP returns expires_in <= 30, _expires_at lands in the past and every get_token() call will re-fetch a token. Clamping the buffer (e.g. max(expires_in - buffer, expires_in / 2)) would avoid hammering the token endpoint for short-lived tokens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants